Street trees especially in the urban areas contribute to environmental sustainability and biodiversity.Some of the benefits of street trees include reducing pollutants and carbon emissions, physical and mental health well being, floods prevention and increase in property value. New York City(NYC) being one of the world's largest city has been implementing various tree planting and protection programs with the help of volunteers and non-profit organizations.
For this project I have considered datset from NYC 2015 Street tree census collected by NYC Department of Parks and Recreation.The data was primarily collected from 5 boroughs(Manhattan, Bronx, Queens, Staten Island,Brooklyn) of NYC. The following link contains the data dictionary and provides more information about the dataset.
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("2015_Street_Tree_Census.csv")
df.head()
df.shape
df.info()
tree_df = df.drop(['block_id',
'spc_latin',
'user_type',
'address',
'community board',
'cncldist',
'st_assem',
'st_senate',
'boro_ct',
'state',
'council district',
'census tract',
'bin',
'bbl',
'zip_city',
'created_at',
'postcode',
'steward',
'curb_loc'], axis = 1)
tree_df.shape
null_data = tree_df.isnull().sum()
null_data[null_data > 0]
tree_df['status'].value_counts()
tree_df[(tree_df.status == "Alive") & (tree_df.spc_common.isnull())]
The dataset contains a total of 652173 trees that are "Alive". The remaining of 31615 trees are either "Dead" or "Stumps". As mentioned in the data dictionary most of the data was not collected for stumps and dead trees and were accountable for null values.Species names("spc_common') were not included for five of the trees that are alive. For the purpose of this analysis data with stumps, dead trees and missing species name will be excluded.
flt_data = tree_df[(tree_df.status == "Alive") & (tree_df.spc_common.notnull())]
flt_data.shape
print(f"Total number of species: {flt_data['spc_common'].nunique()}")
plt.figure(figsize = (10,8))
ax=sns.countplot(x="borough", data=flt_data, order = flt_data['borough'].value_counts().index, palette="muted")
ax.set(xlabel="NYC Boroughs", ylabel = "Total number of trees", title="Tree count in each of the boroughs")
for p in ax.patches:
ax.annotate('{}'.format(p.get_height()), (p.get_x()+0.2, p.get_height()+50))
Within the 5 boroughs, Queens has the highest number of trees and Manhattan have least number among the five boroughs.The land area could be the primary reasons for Manhattan having fewer trees comparitively. Manhattan is 22.8 sq miles and Queens is 108.1 sq miles in land area.
plt.figure(figsize = (15,8))
ax=sns.countplot(y="spc_common", data=flt_data, palette="muted", order = flt_data['spc_common'].value_counts().iloc[:20].index)
ax.set(xlabel="Total number of trees", ylabel = "Species name", title="Distribution of Species")
total = len(flt_data['spc_common'])
for p in ax.patches:
percentage = '{:.1f}%'.format(100 * p.get_width()/total)
x = p.get_x() + p.get_width() + 0.02
y = p.get_y() + p.get_height()/2
ax.annotate(percentage, (x, y))
London Planetree(13.3%) is predominent among the 132 species followed by honeylocust(9.9%).
g=flt_data.groupby("borough")
plt.subplots_adjust(top=3.0, bottom=0.5,left=0.9, right=3.0, hspace=0.5, wspace=0.35)
plt.subplot(321)
sns.countplot(y="spc_common", data=g.get_group("Manhattan"), order = flt_data['spc_common'].value_counts().iloc[:10].index)
plt.xlabel("Count")
plt.ylabel("Species")
plt.title("Manhattan")
plt.subplot(322)
sns.countplot(y="spc_common", data=g.get_group("Queens"), order = flt_data['spc_common'].value_counts().iloc[:10].index)
plt.xlabel("Count")
plt.ylabel("Species")
plt.title("Queens")
plt.subplot(323)
sns.countplot(y="spc_common", data=g.get_group("Brooklyn"), order = flt_data['spc_common'].value_counts().iloc[:10].index)
plt.xlabel("Count")
plt.ylabel("Species")
plt.title("Brooklyn")
plt.subplot(324)
sns.countplot(y="spc_common", data=g.get_group("Bronx"), order = flt_data['spc_common'].value_counts().iloc[:10].index)
plt.xlabel("Count")
plt.ylabel("Species")
plt.title("Bronx")
plt.subplot(325)
sns.countplot(y="spc_common", data=g.get_group("Staten Island"), order = flt_data['spc_common'].value_counts().iloc[:10].index)
plt.xlabel("Count")
plt.ylabel("Species")
plt.title("Staten Island")
For this countplot I have selected only the top 10 common species and checked there count in each of the boroughs.Honeylocust is widely spread across Bronx and Manhattan, whereas London Plane tree is predominant in Queens and Brooklyn. Callery Pear seems to be more common in Staten Island.Gingko and Sophora are less common in Staten Island compared to other boroughs.
import json
import geopandas as gpd
from area import area
import plotly.graph_objects as go
import plotly.io as pio
nyc_data = json.load(open("NYC_geodata.geojson"))
d = {}
neighborhood = nyc_data["features"]
for n in neighborhood:
code = n["properties"]["ntacode"]
a = area(n["geometry"])/(1609*1609) # converts from m^2 to mi^2
d[code] = a
flt_data["area"] = flt_data["nta"].map(d)
flt_data = flt_data.dropna(subset=["area"])
flt_data['count_trees'] = flt_data.groupby('nta')['nta'].transform('count')
flt_data["density"] = flt_data["count_trees"]/flt_data["area"]
flt_data.head()
import math
import plotly.express as px
import plotly.offline as pyo
fig = px.choropleth_mapbox(flt_data,
geojson=nyc_data,
locations="nta",
featureidkey="properties.ntacode",
color="density",
color_continuous_scale="viridis",
mapbox_style="carto-positron",
zoom=9, center={"lat": 40.7, "lon": -73.9},
opacity=0.7,
hover_name="nta_name"
)
fig.show()
fig.write_html("myplot.html")